-
-
Notifications
You must be signed in to change notification settings - Fork 29
feat(perf): enhance performance by improving disk caching and applying other optimizations #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(perf): enhance performance by improving disk caching and applying other optimizations #556
Conversation
af2170a to
221fb3d
Compare
Add persistent namespace caching to significantly improve warm start performance. Cached namespaces are loaded from disk instead of being rebuilt from scratch. Key changes: - Add NamespaceMetaData and NamespaceCacheData frozen dataclasses for cache serialization with validation fields (mtime, content_hash, python_executable, sys_path_hash) - Add atomic cache writes using temp file + rename pattern - Add reverse dependency tracking for efficient library/variable change propagation (get_library_users, get_variables_users) - Skip content hash computation when mtime AND size match - Add ResourceMetaData for resource caching Tests: - Unit tests for PickleDataCache atomic writes (28 tests) - Unit tests for NamespaceMetaData and cached entries (20 tests) - Unit tests for ResourceMetaData cache keys (15 tests) - Integration tests for namespace caching behavior (11 tests)
221fb3d to
ec8f008
Compare
Optimize workspace-wide reference operations from O(D) to O(k) where D = total documents and k = documents actually referencing the target. Changes: - Add reverse index data structures in DocumentsCacheHelper to track which documents reference each keyword/variable - Use stable (source, name) tuple keys resilient to cache invalidation - Implement diff-based updates to handle removed references after edits - Add get_keyword_ref_users() and get_variable_ref_users() for O(1) lookup - Update Find References to use reverse index with workspace scan fallback - Update unused keyword/variable detection to use reverse index
|
I've included the second performance optimization, which can be found from this commit: 59e0314 ProblemPreviously, workspace-wide reference operations (Find References, unused keyword/variable detection) required scanning all documents in the workspace for each lookup. This resulted in O(D) per-lookup complexity, where D is the total number of documents. For unused-detection checking of K keywords, this required O(K × D) operations, causing noticeable delays in large workspaces. SolutionAdded a reverse index that maps each keyword/variable to the documents that reference it. This reduces lookup complexity from O(D) to O(k), where k is the number of documents that actually use the target (typically much smaller than D). ArchitectureBefore: O(D) Workspace Scanflowchart LR
subgraph "Find References (Before)"
A[Request: Find refs to 'My Keyword'] --> B[Scan ALL documents]
B --> C[doc_001.robot]
B --> D[doc_002.robot]
B --> E[doc_003.robot]
B --> F[...]
B --> G[doc_999.robot]
C --> H[Check namespace]
D --> H
E --> H
F --> H
G --> H
H --> I[Return matches]
end
After: O(k) Reverse Index Lookupflowchart LR
subgraph "Find References (After)"
A[Request: Find refs to 'My Keyword'] --> B[Lookup reverse index]
B --> C["index[('source', 'My Keyword')]"]
C --> D[Return: doc_003, doc_047, doc_891]
D --> E[Scan only 3 documents]
E --> F[Return matches]
end
Data Structuresflowchart TB
subgraph "Reverse Index Structure"
direction TB
A["_keyword_ref_users<br/>dict[tuple[str, str], WeakSet[TextDocument]]"]
B["Key: (source, name)<br/>e.g. ('common.resource', 'My Keyword')"]
C["Value: WeakSet of documents<br/>that reference this keyword"]
A --> B
A --> C
end
subgraph "Forward Index (for diff-based updates)"
direction TB
D["_doc_keyword_refs<br/>WeakKeyDictionary[TextDocument, set]"]
E["Key: TextDocument"]
F["Value: set of (source, name) tuples<br/>this document references"]
D --> E
D --> F
end
|
Extend namespace disk caching to include keyword references, variable references, and local variable assignments. This allows the analysis phase to be completely skipped when loading from a valid cache. Key changes: - Add KeywordRefKey and VariableRefKey stable keys for serialization - Serialize/restore keyword_references, variable_references, and local_variable_assignments in namespace cache - Implement 10% staleness threshold: if >10% of cached references cannot be resolved, fall back to fresh analysis - Track references when loading fully-analyzed namespaces from cache
- Format 4 source files to pass ruff style checks - Fix filepath_base tests to use Path for cross-platform compatibility instead of hardcoded Unix byte strings for hash computation
Summary
This PR implements a comprehensive performance improvement plan that dramatically reduces Language Server startup time and improves edit responsiveness.
LibraryDocobjects to.robotcode_cache/*/resource/max_workers=min(cpu_count, 4))get_library_users()andget_variables_users()for instant change propagationTesting Plan
Architecture
Architecture Overview
flowchart TB subgraph cache["Disk Cache (.robotcode_cache/)"] lib["libdoc/<br/>Library docs"] res["resource/<br/>Resource docs"] ns["namespace/<br/>Namespace state"] end subgraph perf["Performance Optimizations"] p1["Shared Process Pool"] p2["Resource Caching"] p3["Targeted Invalidation"] p4["Parallel Library Loading"] p5["O(1) Dependency Lookups"] p7["Namespace Caching"] p9["Atomic Writes"] end subgraph maps["Reverse Dependency Maps"] imp["_importers<br/>source → documents"] lu["_library_users<br/>lib → documents"] vu["_variables_users<br/>var → documents"] end p1 --> lib p2 --> res p7 --> ns p9 --> ns p3 --> imp p5 --> lu p5 --> vuCold vs Warm Start Flow
flowchart TB subgraph startup["IDE Startup"] open["User Opens VS Code"] end subgraph cold["Cold Start (No Cache) ~2-4 min"] c1["Parse .robot files"] c2["Resolve imports in parallel"] c3["Load libraries<br/>(shared executor)"] c4["Build namespaces"] c5["Save to cache<br/>(atomic write)"] c1 --> c2 --> c3 --> c4 --> c5 end subgraph warm["Warm Start (Cache Hit) ~10-20 sec"] w1["Check .cache.pkl exists"] w2{"Validate:<br/>mtime + size?"} w3["Load (meta, spec)"] w4["Check environment identity"] w5["Restore namespace"] w1 --> w2 w2 -->|"Match"| w3 --> w4 --> w5 w2 -->|"Changed"| miss["Rebuild"] end subgraph runtime["Runtime Editing"] r1["File changed"] r2["O(1) lookup affected docs"] r3["Targeted invalidation"] r4["Rebuild only affected"] r1 --> r2 --> r3 --> r4 end open --> cold open --> warm style cold fill:#ffcccc,stroke:#cc0000 style warm fill:#ccffcc,stroke:#00cc00 style runtime fill:#cce5ff,stroke:#0066ccCache Validation Chain
sequenceDiagram participant LS as Language Server participant Cache as Disk Cache participant FS as File System Note over LS,FS: Warm Start Validation LS->>Cache: Load .cache.pkl Cache-->>LS: (meta, spec) tuple LS->>FS: stat(source_file) FS-->>LS: mtime, size alt mtime matches alt size matches Note over LS: Skip content hash!<br/>(Fast path) LS->>LS: Check python_executable LS->>LS: Check sys_path_hash alt Environment matches LS->>LS: Restore from cache ✓ else Environment changed LS->>LS: Rebuild namespace end else size differs LS->>FS: Read first+last 64KB FS-->>LS: content chunks LS->>LS: Compute tiered hash alt Hash matches LS->>LS: Restore from cache ✓ else Hash differs LS->>LS: Rebuild namespace end end else mtime differs LS->>LS: Rebuild namespace end